Goto

Collaborating Authors

 nullt 1 2



Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning

Zhang, Dechen, Shi, Zhenmei, Zhang, Yi, Liang, Yingyu, Zou, Difan

arXiv.org Machine Learning

Kernel ridge regression (KRR) is a foundational tool in machine learning, with recent work emphasizing its connections to neural networks. However, existing theory primarily addresses the i.i.d. setting, while real-world data often exhibits structured dependencies - particularly in applications like denoising score learning where multiple noisy observations derive from shared underlying signals. We present the first systematic study of KRR generalization for non-i.i.d. data with signal-noise causal structure, where observations represent different noisy views of common signals. By developing a novel blockwise decomposition method that enables precise concentration analysis for dependent data, we derive excess risk bounds for KRR that explicitly depend on: (1) the kernel spectrum, (2) causal structure parameters, and (3) sampling mechanisms (including relative sample sizes for signals and noises). We further apply our results to denoising score learning, establishing generalization guarantees and providing principled guidance for sampling noisy data points. This work advances KRR theory while providing practical tools for analyzing dependent data in modern machine learning applications.



On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay

Neural Information Processing Systems

The widely observed'benign overfitting phenomenon' in the neural network literature raises the challenge to the'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent


On the Saturation Effects of Spectral Algorithms in Large Dimensions

Lu, Weihao, Zhang, Haobo, Li, Yicheng, Lin, Qian

arXiv.org Machine Learning

The saturation effects, which originally refer to the fact that kernel ridge regression (KRR) fails to achieve the information-theoretical lower bound when the regression function is over-smooth, have been observed for almost 20 years and were rigorously proved recently for kernel ridge regression and some other spectral algorithms over a fixed dimensional domain. The main focus of this paper is to explore the saturation effects for a large class of spectral algorithms (including the KRR, gradient descent, etc.) in large dimensional settings where $n \asymp d^{\gamma}$. More precisely, we first propose an improved minimax lower bound for the kernel regression problem in large dimensional settings and show that the gradient flow with early stopping strategy will result in an estimator achieving this lower bound (up to a logarithmic factor). Similar to the results in KRR, we can further determine the exact convergence rates (both upper and lower bounds) of a large class of (optimal tuned) spectral algorithms with different qualification $\tau$'s. In particular, we find that these exact rate curves (varying along $\gamma$) exhibit the periodic plateau behavior and the polynomial approximation barrier. Consequently, we can fully depict the saturation effects of the spectral algorithms and reveal a new phenomenon in large dimensional settings (i.e., the saturation effect occurs in large dimensional setting as long as the source condition $s>\tau$ while it occurs in fixed dimensional setting as long as $s>2\tau$).


Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Li, Yicheng, Gan, Weiye, Shi, Zuoqiang, Lin, Qian

arXiv.org Artificial Intelligence

The neural tangent kernel (NTK) theory (Jacot et al., 2018), which shows that the gradient kernel regression approximates the over-parametrized neural network trained by gradient descent well (Jacot et al., 2018; Allen-Zhu et al., 2019; Lee et al., 2019), brings us a natural surrogate to understand the generalization behavior of the neural networks in certain circumstances. This surrogate has led to recent renaissance of the study of kernel methods. For example, one would ask whether overfitting could harm the generalization (Bartlett et al., 2020), how the smoothness of the underlying regression function would affect the generalization error (Li et al., 2023), or if one can determine the lower bound of the generalization error at a specific function? All these problems can be answered by the generalization error curve which aims at determining the exact generalization error of a certain kernel regression method with respect to the kernel, the regression function, the noise level and the choice of the regularization parameter. It is clear that such a generalization error curve would provide a comprehensive picture of the generalization ability of the corresponding kernel regression method (Bordelon et al., 2020; Cui et al., 2021; Li et al., 2023).


Distributed Gradient Descent for Functional Learning

Yu, Zhan, Fan, Jun, Zhou, Ding-Xuan

arXiv.org Artificial Intelligence

In recent years, different types of distributed learning schemes have received increasing attention for their strong advantages in handling large-scale data information. In the information era, to face the big data challenges which stem from functional data analysis very recently, we propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space. Based on integral operator approaches, we provide the first theoretical understanding of the DGDFL algorithm in many different aspects in the literature. On the way of understanding DGDFL, firstly, a data-based gradient descent functional learning (GDFL) algorithm associated with a single-machine model is proposed and comprehensively studied. Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression. We further provide a semi-supervised DGDFL approach to weaken the restriction on the maximal number of local machines to ensure optimal rates. To our best knowledge, the DGDFL provides the first distributed iterative training approach to functional learning and enriches the stage of functional data analysis.